AITopics | Flathead County

Collaborating Authors

Flathead County

OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence

Yuan, Long, Mo, Fengran, Huang, Kaiyu, Wang, Wenjie, Zhai, Wangyuxuan, Zhu, Xiaoyu, Li, You, Xu, Jinan, Nie, Jian-Yun

arXiv.org Artificial IntelligenceMar-20-2025

The rapid advancement of multimodal large language models (LLMs) has opened new frontiers in artificial intelligence, enabling the integration of diverse large-scale data types such as text, images, and spatial information. In this paper, we explore the potential of multimodal LLMs (MLLM) for geospatial artificial intelligence (GeoAI), a field that leverages spatial data to address challenges in domains including Geospatial Semantics, Health Geography, Urban Geography, Urban Perception, and Remote Sensing. We propose a MLLM (OmniGeo) tailored to geospatial applications, capable of processing and analyzing heterogeneous data sources, including satellite imagery, geospatial metadata, and textual descriptions. By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems. Results demonstrate that our model outperforms task-specific models and existing LLMs on diverse geospatial tasks, effectively addressing the multimodality nature while achieving competitive results on the zero-shot geospatial tasks. Our code will be released after publication.

large language model, machine learning, omnigeo, (21 more...)

arXiv.org Artificial Intelligence

2503.16326

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Beijing > Beijing (0.06)
(20 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.69)
Government (0.68)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.56)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Contextual Document Embeddings

Morris, John X., Rush, Alexander M.

arXiv.org Artificial IntelligenceNov-8-2024

Dense document embeddings are central to neural retrieval. The dominant paradigm is to train and construct embeddings by running encoders directly on individual documents. In this work, we argue that these embeddings, while effective, are implicitly out-of-context for targeted use cases of retrieval, and that a contextualized document embedding should take into account both the document and neighboring documents in context - analogous to contextualized word embeddings. We propose two complementary methods for contextualized document embeddings: first, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss; second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation. Results show that both methods achieve better performance than biencoders in several settings, with differences especially pronounced out-of-domain. We achieve state-of-the-art results on the MTEB benchmark with no hard negative mining, score distillation, dataset-specific instructions, intra-GPU example-sharing, or extremely large batch sizes. Our method can be applied to improve performance on any contrastive learning dataset and any biencoder.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.02525

Country:

North America > United States > Montana > Flathead County (0.04)
North America > United States > Michigan > Iosco County (0.04)
North America > United States > California (0.04)
(16 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Law (1.00)
Leisure & Entertainment > Sports > Football (0.67)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Embedding And Clustering Your Data Can Improve Contrastive Pretraining

Merrick, Luke

arXiv.org Artificial IntelligenceJul-26-2024

Recent studies of large-scale contrastive pretraining in the text embedding domain show that using single-source minibatches, rather than mixed-source minibatches, can substantially improve overall model accuracy. In this work, we explore extending training data stratification beyond source granularity by leveraging a pretrained text embedding model and the classic k-means clustering algorithm to further split training data apart by the semantic clusters within each source. Experimentally, we observe a notable increase in NDCG@10 when pretraining a BERT-based text embedding model on query-passage pairs from the MSMARCO passage retrieval dataset. Additionally, we conceptually connect our clustering approach to both the Topic Aware Sampling (TAS) aspect of the TAS-B methodology and the nearest-neighbor-based hard-negative mining aspect of the ANCE methodology and discuss how this unified view motivates future lines of research on the organization of contrastive pretraining data.

contrastive pretraining, dataset, embedding and clustering, (13 more...)

arXiv.org Artificial Intelligence

2407.18887

Country:

North America > United States > Montana > Flathead County > Kalispell (0.14)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
North America > Canada (0.04)
(18 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback